Soil Transmitted Helminths are intestinal worms that are transmitted through feces. The feces of an infected person may contain the eggs of the worms. When an infected person defecates in an open area, or if the feces is used as manure, it may contaminate food or other items that humans may touch or consume. A person gets infected if they ingest the eggs. There are three types of helminths – Roundworms, Whipworms and Hookworms (CDC, 2022).
The focus of this study is on STH infections in Kenya. Worm in Kenya are a significant public health burden and may reduce long-term income of individuals by upto 32% (Baird et al., 2015). From the literature on helminths transmission, I identify two exposure covariates – Temperature and Nightlights. STH prevalence is likely to be higher in regions with higher temperature (Pullan et al., 2011). Poverty levels are also a risk factor (Scholte et al., 2013) as high poverty levels may correlate with unhygienic sanitation which in turn may be the underlying risk factor. Nightlights data may serve as a remote-sensed proxy for poverty levels in a region (Jean et al., 2016).
In this study, I analyze the prevalence of Roundworms (Ascaris lumbricoides) in Kenya. I intend to identify any spatial relationship in the prevalence, as well as estimate the correlation between the covariates and prevalence rates. I use temperature data from World Climate website, and nightlights data (DMSP) provided by the NOAA.
In this study, I analyze the prevalence of Roundworms (Ascaris lumbricoides) in Kenya. The sample size is 67 observations. The data includes a count of stools examined and a count of stools that tested positive for Roundworms.
I use two covariates in the analysis – (1) Temperature and (2)
Nightlights (as a proxy for poverty). I use the temperature data
provided by World Climate (bio1 variable) and I use the Nightlights data
provided by NOAA. Both datasets are in raster form and I
extract the covariate values in the locations for which
Roundworms data is available.
I first conduct a clustering activity, at both the Global as well as the Local level. I use the Global Moran’s I statistic to quantify the extent of Global clustering. I also calculate the statistic at various spatial lags and assess them for statistical significance.
For local clustering, I use the Local Moran’s I statistic. I identify the points at which the statistic value is higher than two standard deviations and I plot those points on the map. I then conduct a regression analysis to identify the impact of the covariates (temperature and nightlights) on the outcome variable.
I first conduct a non-spatial regression analysis. I analyze the residuals for spatial autocorrelation using Moran’s I Statistic.
I then conduct a spatial regression analysis. I use the Matern’s function for the spatially correlated random effect. I analyze the autocorrelation in errors in the spatial model using Moran’s I Statistic.
## Warning: package 'dplyr' was built under R version 4.1.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Warning: package 'raster' was built under R version 4.1.3
## Loading required package: sp
## Warning: package 'sp' was built under R version 4.1.2
##
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
##
## select
## Warning: package 'ape' was built under R version 4.1.3
##
## Attaching package: 'ape'
## The following objects are masked from 'package:raster':
##
## rotate, zoom
## Warning: package 'wesanderson' was built under R version 4.1.3
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:raster':
##
## intersect, union
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
## Warning: package 'leaflet' was built under R version 4.1.3
## Warning: package 'pgirmess' was built under R version 4.1.3
## Registered S3 method overwritten by 'spdep':
## method from
## plot.mst ape
## Warning: package 'geoR' was built under R version 4.1.3
## --------------------------------------------------------------
## Analysis of Geostatistical Data
## For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
## geoR version 1.8-1 (built on 2020-02-08) is now loaded
## --------------------------------------------------------------
## Warning: package 'spdep' was built under R version 4.1.3
## Loading required package: spData
## Warning: package 'spData' was built under R version 4.1.3
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: sf
## Warning: package 'sf' was built under R version 4.1.3
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
## Warning: package 'spaMM' was built under R version 4.1.3
## Registered S3 methods overwritten by 'registry':
## method from
## print.registry_field proxy
## print.registry_entry proxy
## spaMM (Rousset & Ferdy, 2014, version 3.11.14) is loaded.
## Type 'help(spaMM)' for a short introduction,
## 'news(package='spaMM')' for news,
## and 'citation('spaMM')' for proper citation.
## Further infos, slides, etc. at https://gitlab.mbb.univ-montp2.fr/francois/spamm-ref.
## Warning: package 'ggplot2' was built under R version 4.1.2
ken_sth <- read.csv("Kenya_STH.csv")
ken_sth$asc_prev <- ken_sth$asc_np/ken_sth$stoolsexaminedgeo
ken_sth_SPDF <- SpatialPointsDataFrame(coords = ken_sth[,c("long", "lat")],
data = ken_sth[,c("stoolsexaminedgeo", "asc_np","asc_prev")],
proj4string = CRS("+init=epsg:4326"))
ken_adm <- getData(name = "GADM",country="KEN",level=1)
climate <- getData('worldclim', var='bio', res=2.5)
# Downloaded from https://ngdc.noaa.gov/eog/dmsp/downloadV4composites.html
nl <- raster("F182010.v4d_web.avg_vis.tif")
ken_sth_SPDF$temp <- extract(climate$bio1, ken_sth_SPDF)
ken_sth_SPDF$nl <- extract(nl, ken_sth_SPDF)
ken_sth$temp <- extract(climate$bio1, ken_sth_SPDF)
ken_sth$nl <- extract(nl, ken_sth_SPDF)
The map below plots Roundworms Prevalence. We see that Busia region has a higher prevalence compared to Siaya region.
colorPal <- colorNumeric(wes_palette("Zissou1")[1:5], ken_sth_SPDF$asc_prev, n = 5)
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data=ken_adm, weight = 2, fillOpacity=0,
popup = ken_adm$NAME_1,
color = "gray") %>%
addCircleMarkers(data=ken_sth_SPDF,
color = colorPal(ken_sth_SPDF$asc_prev),
radius = 2,
popup = as.character(ken_sth_SPDF$asc_prev)) %>%
addLegend(pal = colorPal,
title = "Roundworm Prevalence",
values = ken_sth_SPDF$asc_prev)
The map below plots Mean Annual Temparature. Temperature at the coastlines seem to be higher than at the inlands.
colorPal <- colorNumeric(wes_palette("Zissou1")[1:5], ken_sth_SPDF$temp, n = 5)
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data=ken_adm, weight = 2, fillOpacity=0,
popup = ken_adm$NAME_1,
color = "gray") %>%
addCircleMarkers(data=ken_sth_SPDF,
color = colorPal(ken_sth_SPDF$temp),
radius = 2,
popup = as.character(ken_sth_SPDF$temp)) %>%
addLegend(pal = colorPal,
title = "Nightlights",
values = ken_sth_SPDF$temp)
The map below plots Nightlights. Higher values indicate higher levels of economic activity and wealth (and lower levels of poverty).
colorPal <- colorNumeric(wes_palette("Zissou1")[5:1], ken_sth_SPDF$nl, n = 5)
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
addPolygons(data=ken_adm, weight = 2, fillOpacity=0,
popup = ken_adm$NAME_1,
color = "gray") %>%
addCircleMarkers(data=ken_sth_SPDF,
color = colorPal(ken_sth_SPDF$nl),
radius = 2,
popup = as.character(ken_sth_SPDF$nl)) %>%
addLegend(pal = colorPal,
title = "Nightlights",
values = ken_sth_SPDF$nl)
We see that the p-value of Global Moran’s I is
9.536816e-13, which indicates a statistically significant
Moran’s I statistic. We can conclude that our data does have global
clustering.
The graph below plots the Moran’s I at different spatial lags.
KE.dists <- as.matrix(dist(cbind(ken_sth$long, ken_sth$lat)))
KE.dists.inv <- 1/KE.dists
diag(KE.dists.inv) <- 0
Moran.I(ken_sth$asc_prev, KE.dists.inv)
## $observed
## [1] 0.2104991
##
## $expected
## [1] -0.01515152
##
## $sd
## [1] 0.03161684
##
## $p.value
## [1] 9.536816e-13
maxDist<-max(dist(cbind(ken_sth$long, ken_sth$lat)))
xy=cbind(ken_sth$long, ken_sth$lat)
pgi.cor <- correlog(coords=xy, z=ken_sth$asc_prev, method="Moran", nbclass=10)
pgi_cor <- plot(pgi.cor)
We see that the Moran’s I statistic is highest at the
0.22 lag, with an I Statistic of 0.34 and p-value of
4.445339e-07 which is statistically significant. This
suggests that that clustering is highest at the 0.22
spatial lag.
We find six points that are statistically significant outliers using the p-values of Local Moran’s I Statistic. These points are visualized in the plot below.
coords<-coordinates(xy)
IDs<-row.names(as.data.frame(coords))
Neigh_nb<-knn2nb(knearneigh(coords, k=1, longlat = TRUE), row.names=IDs)
dsts<-unlist(nbdists(Neigh_nb,coords))
max_1nn<-max(dsts)
Neigh_kd1<-dnearneigh(coords,d1=0, d2=max_1nn, row.names=IDs)
weights<-nb2listw(Neigh_kd1, style="W")
I <-localmoran(ken_sth$asc_prev, weights)
Coef<-printCoefmat(data.frame(I[IDs,], row.names=row.names(coords),
check.names=FALSE))
## Ii E.Ii Var.Ii Z.Ii Pr(z != E(Ii))
## [1,] 1.2089e+00 -6.7779e-02 1.1398e-01 3.7816 0.0001558 ***
## [2,] 4.1688e-01 -6.7469e-03 1.1328e-02 3.9802 6.887e-05 ***
## [3,] 1.1101e+00 -8.4877e-02 1.1565e-01 3.5139 0.0004415 ***
## [4,] 1.2863e+00 -1.2322e-01 1.6086e-01 3.5145 0.0004406 ***
## [5,] 7.0497e-01 -2.4159e-02 3.7386e-02 3.7709 0.0001626 ***
## [6,] -8.6766e-01 -2.2591e-02 3.9829e-02 -4.2344 2.292e-05 ***
## [7,] 1.4968e-02 -6.5790e-06 1.2678e-05 4.2055 2.605e-05 ***
## [8,] -1.5923e-01 -7.2343e-04 1.3931e-03 -4.2467 2.170e-05 ***
## [9,] -1.2093e-01 -4.1986e-04 8.0876e-04 -4.2374 2.261e-05 ***
## [10,] -1.9024e-01 -1.2160e-03 2.1909e-03 -4.0383 5.384e-05 ***
## [11,] 1.0238e+00 -3.7500e-02 6.9555e-02 4.0243 5.715e-05 ***
## [12,] -6.0074e-01 -9.6220e-03 1.8364e-02 -4.3621 1.288e-05 ***
## [13,] 4.6619e-01 -1.2006e-02 1.7661e-02 3.5984 0.0003202 ***
## [14,] 1.1410e-01 -4.6346e-04 8.3563e-04 3.9633 7.392e-05 ***
## [15,] 1.2787e+00 -6.2197e-02 1.1240e-01 3.9994 6.350e-05 ***
## [16,] 8.7116e-01 -3.8653e-02 5.8927e-02 3.7480 0.0001783 ***
## [17,] 3.8783e-01 -6.7469e-03 1.0627e-02 3.8276 0.0001294 ***
## [18,] 6.1997e-01 -1.2612e-02 2.3997e-02 4.0835 4.436e-05 ***
## [19,] 3.9664e-02 -4.6393e-05 8.9400e-05 4.1998 2.671e-05 ***
## [20,] 2.8877e-04 -2.4428e-09 4.7074e-09 4.2089 2.567e-05 ***
## [21,] -2.9746e-02 -2.5790e-05 4.9698e-05 -4.2158 2.489e-05 ***
## [22,] 3.4355e-01 -3.6727e-03 7.0516e-03 4.1349 3.551e-05 ***
## [23,] 1.0378e+00 -3.8653e-02 7.1609e-02 4.0226 5.755e-05 ***
## [24,] -1.4188e-03 -5.8953e-08 1.1361e-07 -4.2092 2.562e-05 ***
## [25,] -4.1314e-02 -7.6116e-03 3.8930e-03 -0.5402 0.5890900
## [26,] 4.9355e-02 -8.4690e-03 7.2130e-03 0.6808 0.4959666
## [27,] 6.9901e-03 -3.1041e-03 2.0733e-03 0.2217 0.8245557
## [28,] 3.9405e-03 -3.8988e-03 2.6020e-03 0.1537 0.8778602
## [29,] -7.2187e-03 -8.3141e-06 8.0658e-06 -2.5388 0.0111223 *
## [30,] 2.1400e-01 -8.5176e-03 8.1928e-03 2.4584 0.0139569 *
## [31,] -2.4077e-01 -8.1146e-03 7.8084e-03 -2.6328 0.0084672 **
## [32,] -1.6955e-01 -4.1769e-03 4.0352e-03 -2.6033 0.0092340 **
## [33,] 1.8256e-01 -6.0510e-03 5.8348e-03 2.4692 0.0135408 *
## [34,] 3.3159e-01 -2.2591e-02 2.1421e-02 2.4199 0.0155232 *
## [35,] 2.9920e-01 -1.7867e-02 1.7023e-02 2.4301 0.0150926 *
## [36,] 1.9561e-01 -7.0157e-03 6.7584e-03 2.4647 0.0137131 *
## [37,] 1.6922e-01 -5.1474e-03 4.9680e-03 2.4739 0.0133652 *
## [38,] 2.6149e-01 -1.3214e-02 1.2650e-02 2.4424 0.0145893 *
## [39,] -2.8142e-01 -1.0862e-02 1.0423e-02 -2.6501 0.0080471 **
## [40,] -9.8343e-02 -1.4618e-03 1.4161e-03 -2.5745 0.0100389 *
## [41,] 3.4043e-02 -1.8978e-04 1.8408e-04 2.5231 0.0116311 *
## [42,] 2.6575e-01 -1.3696e-02 1.3105e-02 2.4410 0.0146461 *
## [43,] 9.0190e-02 -1.3824e-03 1.3392e-03 2.5023 0.0123400 *
## [44,] 1.8256e-01 -6.0510e-03 5.8348e-03 2.4692 0.0135408 *
## [45,] 3.0187e-01 -1.8229e-02 1.7362e-02 2.4293 0.0151281 *
## [46,] -2.2739e-01 -7.2877e-03 7.0185e-03 -2.6272 0.0086083 **
## [47,] 6.6097e-02 -7.3054e-04 7.0820e-04 2.5112 0.0120336 *
## [48,] 1.0978e-01 -2.0757e-03 2.0095e-03 2.4951 0.0125913 *
## [49,] 1.1176e-01 -2.1543e-03 2.0855e-03 2.4944 0.0126169 *
## [50,] 8.3347e-02 -1.1751e-03 1.1387e-03 2.5048 0.0122526 *
## [51,] -1.0115e+00 -1.0471e-01 9.0944e-02 -3.0071 0.0026379 **
## [52,] 1.8654e-01 -6.3363e-03 6.1081e-03 2.4678 0.0135933 *
## [53,] 1.6922e-01 -5.1474e-03 4.9680e-03 2.4739 0.0133652 *
## [54,] -5.3555e-02 -4.4497e-04 4.3149e-04 -2.5568 0.0105642 *
## [55,] -6.0371e-02 -5.6317e-04 5.4604e-04 -2.5595 0.0104833 *
## [56,] 1.4910e+00 -2.2591e-02 1.4794e+00 1.2444 0.2133488
## [57,] 3.0103e-01 -1.8115e-02 1.7255e-02 2.4296 0.0151169 *
## [58,] 1.2449e-01 -2.6973e-03 2.6097e-03 2.4898 0.0127815 *
## [59,] 2.4689e-01 -1.1638e-02 1.1159e-02 2.4473 0.0143945 *
## [60,] 3.6060e-01 -2.2591e-02 2.4182e-02 2.4642 0.0137330 *
## [61,] 1.4910e+00 -2.2591e-02 1.4794e+00 1.2444 0.2133488
## [62,] -2.4282e-01 -5.9917e-04 1.9752e-02 -1.7235 0.0847993 .
## [63,] 6.2408e-01 -2.2591e-02 7.2831e-01 0.7577 0.4486010
## [64,] -2.4282e-01 -2.2591e-02 1.4794e+00 -0.1811 0.8563161
## [65,] 2.4541e-01 -6.8957e-03 2.2588e-01 0.5309 0.5955063
## [66,] -6.0856e-02 -1.0189e-04 3.3606e-03 -1.0480 0.2946295
## [67,] 2.3988e-01 -9.9302e-03 3.2429e-01 0.4387 0.6608960
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
nci<-moran.plot(ken_sth$asc_prev, listw=weights,
xlab="Prevalence", ylab="Spatially lagged prev", labels=T, pch=16, col="grey")
text(c(0.8,0.8, 0.05,0.05),c(0.4, 0,0.4,0), c("High-High", "High-Low", "Low-High", "Low-Low"), cex=0.8)
nci <- moran.plot(ken_sth$asc_prev, listw=weights,
xlab="Prevalence", ylab="Spatially lagged prev", labels=T, pch=16, col="grey")
text(c(0.8,0.8, 0.05,0.05),c(0.4, 0,0.4,0), c("High-High", "High-Low", "Low-High", "Low-Low"), cex=0.8)
We can also plot these points on the map below.
infl<-nci$is_inf==T
x<-ken_sth$asc_prev
lhx<-cut(x, breaks=c(min(x), mean(x), max(x)), labels=c("L", "H"), include.lowest=T)
wx<-stats::lag(weights,ken_sth$asc_prev)
lhwx<-cut(wx, breaks=c(min(wx), mean(wx), max(wx)), labels=c("L", "H"), include.lowest=T)
lhlh<-interaction(lhx,lhwx,infl,drop=T)
names<-rep("none", length(lhlh))
names[lhlh=="L.L.TRUE"]<-"LL"
names[lhlh=="H.L.TRUE"]<-"HL"
names[lhlh=="L.H.TRUE"]<-"LH"
names[lhlh=="H.H.TRUE"]<-"HH"
ken_sth_localM<-as.data.frame(cbind(xy,names))
colnames(ken_sth_localM)<-c("long", "lat", "names")
ken_sth_localM[c("long", "lat")] <- lapply( ken_sth_localM[c("long", "lat")], function(x) as.numeric(as.character(x)) )
factpal <- colorFactor(c( "cyan4","coral4","coral","cyan","lightgrey"), names)
leaflet_local_clustering <- leaflet(ken_sth_localM) %>% addTiles() %>% addCircleMarkers(~long, ~lat, fillOpacity=1,
color= ~factpal(names), radius=4, stroke=TRUE, weight=1) %>%
addLegend(pal = factpal, values = ~names, title="Class")
leaflet_local_clustering
The High-High points are primarily found in the Busia district, whereas the High-Low points are found in the Siaya and the Kisii district.
We first run non-spatial regression and visualize the autocorrelation in the residuals. Below is a summary of the model.
m1 <- spaMM::fitme(cbind(asc_np, stoolsexaminedgeo - asc_np) ~ temp + nl, data=ken_sth, family=binomial())
summary(m1)
## formula: cbind(asc_np, stoolsexaminedgeo - asc_np) ~ temp + nl
## Estimation of fixed effects by ML.
## family: binomial( link = logit )
## ------------ Fixed effects (beta) ------------
## Estimate Cond. SE t-value
## (Intercept) -4.31280 0.665459 -6.481
## temp 0.01718 0.002789 6.160
## nl -0.13152 0.053824 -2.444
## ------------- Likelihood values -------------
## logLik
## p(h) (Likelihood): -857.4244
We see that nightlights has a significant relationship whereas temperature does not. But we need to first check if there is any spatial autocorrelation in the errors.
Below we plot the Moran’s I Statistic for the residuals.
nbc <- 10
cor_r <- pgirmess::correlog(coords = ken_sth[,c("long", "lat")],
z = residuals(m1),
method="Moran", nbclass=nbc)
correlograms <- as.data.frame(cor_r)
correlograms$variable <- "residuals_glm"
global_plot <- ggplot(subset(correlograms, variable=="residuals_glm"), aes(dist.class, coef)) +
geom_hline(yintercept = 0, col="grey") +
geom_line(col="steelblue") +
geom_point(col="steelblue") +
xlab("distance") +
ylab("Moran's coefficient")+
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
global_plot
We see that there is some spatial autocorrelation.
We see a Moran’s I statistic of 0.075 at a lag of
0.37 with a p-value of 0.096 which is
significant at the \(10\)% level. This
suggests that we may need a spatial model.
We next build a spatial regression model.
In the spatial regression model, We look for autocorrelation of errors in the spatial model using Moran I Statistic and find that it is not statistically significant at the 10% level for any spatial lag. This suggests that a spatial model is appropriate.
m2 <- spaMM::fitme(cbind(asc_np, stoolsexaminedgeo - asc_np) ~ temp + nl + Matern(1|lat+long), data=ken_sth, family=binomial())
summary(m2)
## formula: cbind(asc_np, stoolsexaminedgeo - asc_np) ~ temp + nl + Matern(1 |
## lat + long)
## Estimation of corrPars and lambda by Laplace ML approximation (p_v).
## Estimation of fixed effects by Laplace ML approximation (p_v).
## Estimation of lambda by 'outer' ML, maximizing p_v.
## family: binomial( link = logit )
## ------------ Fixed effects (beta) ------------
## Estimate Cond. SE t-value
## (Intercept) -4.488637 10.93425 -0.41051
## temp 0.004825 0.05138 0.09390
## nl -0.007482 0.22724 -0.03292
## --------------- Random effects ---------------
## Family: gaussian( link = identity )
## --- Correlation parameters:
## 1.nu 1.rho
## 0.6552121 6.6581468
## --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian;
## lat + long : 9.749
## # of obs: 67; # of groups: lat + long, 67
## ------------- Likelihood values -------------
## logLik
## p_v(h) (marginal L): -238.6128
plot(ken_sth$asc_np, predict(m2))
summary(m2)
## formula: cbind(asc_np, stoolsexaminedgeo - asc_np) ~ temp + nl + Matern(1 |
## lat + long)
## Estimation of corrPars and lambda by Laplace ML approximation (p_v).
## Estimation of fixed effects by Laplace ML approximation (p_v).
## Estimation of lambda by 'outer' ML, maximizing p_v.
## family: binomial( link = logit )
## ------------ Fixed effects (beta) ------------
## Estimate Cond. SE t-value
## (Intercept) -4.488637 10.93425 -0.41051
## temp 0.004825 0.05138 0.09390
## nl -0.007482 0.22724 -0.03292
## --------------- Random effects ---------------
## Family: gaussian( link = identity )
## --- Correlation parameters:
## 1.nu 1.rho
## 0.6552121 6.6581468
## --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian;
## lat + long : 9.749
## # of obs: 67; # of groups: lat + long, 67
## ------------- Likelihood values -------------
## logLik
## p_v(h) (marginal L): -238.6128
cor_g <- pgirmess::correlog(coords=ken_sth[,c("long", "lat")],
z=residuals(m2),
method="Moran", nbclass=nbc)
cor_g
## Moran I statistic
## dist.class coef p.value n
## [1,] 0.07951259 -0.063434261 0.8293129 1392
## [2,] 0.22586721 0.052174806 0.1703072 344
## [3,] 0.37222133 0.045657898 0.1862588 490
## [4,] 0.51857545 -0.026783206 0.5872859 1268
## [5,] 0.66492958 -0.003489024 0.4327243 406
## [6,] 0.81128370 -0.012964708 0.4809194 200
## [7,] 0.95763782 -0.032108064 0.5675438 214
## [8,] 1.10399194 0.037002217 0.2828800 46
## [9,] 1.25034606 -0.097053540 0.6830497 54
## [10,] 1.39670019 -0.183882944 0.4398920 8
cor_g <- as.data.frame(cor_g)
as.data.frame(cor_g)
We can also look at the correlogram plot and we see that the spatial model has Moran’s I statistic that are closer to zero than the non-spatial model.
We now see the coefficients for the covariates.
## formula: cbind(asc_np, stoolsexaminedgeo - asc_np) ~ temp + nl + Matern(1 |
## lat + long)
## Estimation of corrPars and lambda by Laplace ML approximation (p_v).
## Estimation of fixed effects by Laplace ML approximation (p_v).
## Estimation of lambda by 'outer' ML, maximizing p_v.
## family: binomial( link = logit )
## ------------ Fixed effects (beta) ------------
## Estimate Cond. SE t-value
## (Intercept) -4.488637 10.93425 -0.41051
## temp 0.004825 0.05138 0.09390
## nl -0.007482 0.22724 -0.03292
## --------------- Random effects ---------------
## Family: gaussian( link = identity )
## --- Correlation parameters:
## 1.nu 1.rho
## 0.6552121 6.6581468
## --- Variance parameters ('lambda'):
## lambda = var(u) for u ~ Gaussian;
## lat + long : 9.749
## # of obs: 67; # of groups: lat + long, 67
## ------------- Likelihood values -------------
## logLik
## p_v(h) (marginal L): -238.6128
None of the covariates are significant in the spatial model.
The analysis also allows us to make Roundworm prevalence predictions. We plot below the predictions from the spatial and the non-spatial model.
Predictions from the non-spatial model:
Predictions from the spatial model:
# Spatial Model
predicted_prevalence_raster <- predict(pred_stack, m2)
predicted_prevalence_raster_ken <- mask(predicted_prevalence_raster, ken_adm)
plot(predicted_prevalence_raster_ken,main="Roundworm Prevalence (Spatial Model)")
lines(ken_adm)
# Discussion
We identify a significant amount of spatial autocorrelation in the disease prevalence. We identified four points, all in the Busia region of Kenya, that have high values for local clustering indicating that this region has a higher prevalence of disease.
This finding is also reflected in the regression model. The non-spatial model has a significant amount of spatial autocorrelation which goes away when we model using a spatial regression model.
In the non-spatial model, the covariate for nightlights is statistically significant whereas temperature is not statistically significant. However, in the spatial model, none of the covariates are statistically significant. This may be because the spatial attributes explain more variance than can be explained by the covariates. There might more spatially correlated covariates that affect the disease prevalence and further analysis is required that explores additional covariates.
We conduct a spatial analysis of Roundworm prevalence in Kenya and find high levels of spatial clustering. We find that nightlights have a statistically significant effect on the prevalence, which however goes away, when we control for spatial autocorrelation. Further analysis that explores additional socio-economic variables is required to understand the drivers of the spatial clustering of prevalence that this analysis has shown.